{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 机器学习项目实践"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By Allen.Huang"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 一、案例说明"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.洛杉矶房价预测"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "（结构化数据的）回归问题\n",
    "\n",
    "### 将要涉及到的算法：\n",
    "\n",
    "线性回归：\n",
    "\n",
    "    梯度下降法\n",
    "\n",
    "    牛顿法\n",
    "\n",
    "    最小二乘法：矩阵|极大似然估计推导--->针对y=Ax+b这种形式的回归问题可以得到最优解\n",
    "\n",
    "SVR\n",
    "\n",
    "XGBoost\n",
    "\n",
    "RandomTree\n",
    "\n",
    "DecisionTree\n",
    "\n",
    "NN\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.泰坦尼克生还者预测"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "（结构化数据的分类）问题\n",
    "\n",
    "Sklearn.GBDT\n",
    "\n",
    "XGBoost\n",
    "\n",
    "LightGBM"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.图像分类(MNIST/CIFAR-10)\n",
    "\n",
    "SVM--两层的神经网络\n",
    "\n",
    "FCNN--全连接神经网络\n",
    "\n",
    "CNN\n",
    "\n",
    "VGG16\n",
    "\n",
    "GoogleNet\n",
    "\n",
    "ResNet\n",
    "\n",
    "LeNet\n",
    "\n",
    "AlexNet\n",
    "\n",
    "图像分类的应用:\n",
    "\n",
    "医学上细胞分类\n",
    "\n",
    "户型图\n",
    "\n",
    "行人/车辆分类/道路指示牌分类\n",
    "\n",
    "鉴黄\n",
    "\n",
    "农作物分类\n",
    "\n",
    "肺癌胸片分类"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.图像检测(VOC/COCO)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "检测算法：\n",
    "\n",
    "Sliding-Window k * k图像 滑窗法 O(k^4) 分类是O(1)\n",
    "\n",
    "ROI推荐算法-X--已经过时\n",
    "\n",
    "RCNN\n",
    "\n",
    "Fast-RCNN\n",
    "\n",
    "Faster-RCNN\n",
    "\n",
    "SSD\n",
    "\n",
    "Yolo\n",
    "\n",
    "Mask-RCNN--Detection+Segmentation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 二、人工智能的Pipeline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "人工智能-基于数学架构的算法方法论\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "数学架构：\n",
    "\n",
    "微积分--梯度下降、牛顿法--模型优化的理论依据\n",
    "\n",
    "概率论--拟合与预测之间关系--归纳和总结之间的关系--机器学习的基础方法论--极大似然估计\n",
    "\n",
    "线性代数--线性代数是求解概率关键参数的工具--矩阵运算--矩阵乘法和求逆运算--BP算法--若干矩阵的乘法"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[1,2,3],\n",
    "[4,5,6],\n",
    "[10,11,12]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[[4,5,6],\n",
    "[7,8,9],\n",
    "[1,2,3]]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "人工智能=人工+智能->没有人工没有智能"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "数据标注-标注平台-众包人员"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "数据工程/大数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "输入来源：市场/产品"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.训练集哪里来？\n",
    "\n",
    "数据工程：爬虫-scrapy,phatomjs,AirFlow|Oozie-数据处理\n",
    "\n",
    "### 2.数据如何变成训练集？\n",
    "\n",
    "数据清洗-Spark...MapR-HDFS/Ceph/Gluster\n",
    "\n",
    "数据仓库-Hive+DFS/Postgres+DFS\n",
    "\n",
    "数据挖掘-Sprak,python,scala-分析数据(我们距离要去训练的目标有多远?)\n",
    "    \n",
    "    数据短缺--填充-数据科学\n",
    "    \n",
    "    特征不充分-特征工程\n",
    "    \n",
    "    EDA-探索性数据分析--重要性会强过机器学习本身\n",
    "    \n",
    "### 3.数据标注\n",
    "\n",
    "数据集市-各种不同需求产生的训练集\n",
    "\n",
    "### 4.模型训练-(各种算法)\n",
    "\n",
    "### 5.模型评测\n",
    "\n",
    "模型工程-模型的压缩、剪枝、蒸馏、转换\n",
    "\n",
    "应用层-AIaaS-Flask+Keras/Flask+Pytorch-QPS-一块 * * 的显卡，最多每秒有多少张运算\n",
    "\n",
    "数据回流-应用层处理后的数据->数据采集层\n",
    "\n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4rc1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
